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12 Abstract 

13 In the coronavirus (CoV), the envelope spike (S) glycoprotein is responsible for CoV cell 

14 entry and host-to-host transmission. The S is a multifunctional glycoprotein that mediates 

15 both attachment of CoV particles to cell surface receptor molecules as well as membrane 

16 penetration by fusion. Receptor-binding domains (RBD) have been identified in the S of 

17 diverse CoV; they usually contain antigenic determinants targeted by antibodies that 

18 neutralize CoV infections. To penetrate host cells, the CoV can use various cell surface 

19 molecules, although they preferentially bind to ectoenzymes. Several crystal structures have 

20 determined the folding of CoV RBD and the mode by which they recognize cell entry 

21 receptors. Here we review the CoV-receptor complex structures reported to date, and 

22 highlight the distinct receptor recognition modes, common features, and key determinants of 

23 the binding specificity. Structural studies have established the basis for understanding 

24 receptor recognition diversity in CoV, its evolution and the adaptation of this vims family to 

25 different hosts. CoV responsible for recent outbreaks have extraordinary potential for cross- 

26 species transmission; their RBD bear large platforms specialized in recognition of receptors 

27 from different species, which facilitates host-to-host circulation and adaptation to man. 

28 

29 

30 

31 Key words: Coronavirus; virus entry; virus-receptor; virus neutralization; ectoenzymes; 

32 glycoproteins 

33 


2 


Page 2 of 49 



ACCEPTED MANUSCRIPT 


33 Introduction 

34 For productive entry into host cells, viruses attach to specific cell surface receptor 

35 molecules (Casasnovas, 2013; Marsh and Helenius, 2006). Selection of an entry receptor is 

36 governed by precise interactions that mediate efficient virus attachment to the cell surface as 

37 well as productive cell infection. Viruses can use a large number of cell surface molecules to 

38 penetrate host cells (Backovic and Rey, 2012); these molecules are the main determinants of 

39 virus tropism and pathogenesis. Receptor-binding motifs in viruses are subject to changes 

40 promoted by immune surveillance, which can target key receptor-binding residues during 

41 neutralization of vims infection. It is thus relatively common that a vims evolves to use 

42 distinct cell entry receptors over the course of an infection, or that related viruses use 

43 different cell surface molecules for host cell entry (Stehle and Casasnovas, 2009). This is the 

44 case of coronavirus (CoV), whose use of distinct entry receptor molecules is responsible for 

45 their broad host range and tissue tropism (Gallagher and Buchmeier, 2001; Masters, 2006). 

46 Some CoV have remarkable capacity for cross-species transmission which is linked to vims 

47 adaptation to the use of orthologous receptor molecules (Graham and Baric, 2010; Holmes, 

48 2005). 

49 The CoV are a large family of enveloped, positive single-stranded RNA viruses involved 

50 in respiratory, enteric, hepatic and neuronal infectious diseases in animals and in man. The 

51 CoV are subdivided into four genera, alpha, beta, gamma and delta (de Groot et al., 2011; de 

52 Groot et al., 2013). Prototype viruses in each genus are transmissible gastroenteritis virus 

53 (TGEV, alphal-CoV), human coronavimses (hCoV-229E and hCoV-NL63, alpha-CoV), 

54 mouse hepatitis vims (MHV, beta-CoV, lineage A), severe acute respiratory syndrome 

55 coronavirus (SARS-CoV, beta-CoV, lineage B), Middle East respiratory syndrome 

56 coronavirus (MERS-CoV, beta-CoV, lineage C), avian infectious bronchitis vims (IBV, 

57 gamma-CoV) and bulbul coronavirus (delta-CoV). The CoV have a major envelope 
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58 glycoprotein, the spike (S), which is responsible for CoV cell entry and interspecies 

59 transmission (Perlman and Netland, 2009). This glycoprotein mediates CoV particle 

60 attachment to cell surface molecules, as well as the fusion of virus and cell membranes 

61 (Masters, 2006). The S protein assembles into trimers, displayed as peplomers in the CoV 

62 envelope (Beniac et al., 2006); the protein has a membrane-distal globular N-terminal SI 

63 portion and a stalk formed by the S2 region. The SI region contains the receptor-binding 

64 determinants, whereas S2 mediates virus-cell fusion for membrane penetration (Fig. 1). 

65 Like the class I fusion proteins, the S2 region adopts a helical structure, and is followed 

66 by the transmembrane domain (Bosch et al., 2003). S2 contains the fusion peptide and two 

67 conserved heptad repeat regions, HR1 (N-terminal) and HR2 (C-terminal) (Fig. 1), which 

68 form a coiled coil structure important for S trimerization and the fusion reaction during CoV 

69 cell entry (Supekar et al., 2004; Xu et al., 2004). The fusion peptide is N-terminal from the 

70 HR1 in the S2 sequence (Fig. 1), but the HR1-HR2 coiled coil structure places it close to the 

71 transmembrane region. As in other enveloped viruses, the initiation of the fusion reaction 

72 requires partial disassembly of the trimeric spikes and the exposure of the fusion peptide for 

73 binding to the host cell membrane (Belouzard et al., 2012; Beniac et al., 2007; Harrison, 

74 2005). In some MHV variants and in the SARS-CoV, the S protein is processed into SI and 

75 S2 fragments by cell proteases, which facilitate the fusion process and cell entry (Belouzard 

76 et al., 2012; Glowacka et al., 2011; Huang et al., 2006). The S of alpha-CoV is not 

77 processed. Receptor-mediated endocytosis and exposure to low pH is a necessary step for 

78 entry of TGEV, hCoV-229E and SARS-CoV (Masters, 2006). Other CoV, such as MHV and 

79 hCoV-NL63, do not require a low pH step for fusion, and the entry processes is mediated by 

80 receptor binding on the cell surface (Huang et al., 2006; Sturman et al., 1990). CoV can thus 

81 follow different entry pathways to penetrate host cells (Belouzard et al., 2012); receptor, low 

82 pH and proteases are three major inducers of membrane fusion, and CoV use them 
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83 differentially for cell entry. Mutations in the SI and S2 fragments indicate that differences 

84 among CoV entry routes are probably related to variations in S trimer stability (Gallagher and 

85 Buchmeier, 2001). Nonetheless, the conformational changes in the CoV S that lead to 

86 membrane fusion and cell entry have not been defined. 

87 The SI region is largely variable in sequence and length, and is specialized in recognition 

88 of cell surface receptors (Fig. 1) (Li, 2012; Masters, 2006); it has several discrete modules or 

89 domains that can fold independently (Bonavia et al., 2003; Du et al., 2013; Godet et al., 1994; 

90 Li et al., 2005a; Reguera et al., 2011; Wu et al., 2009). Receptor-binding domains (RBD) can 

91 be located at the N- and/or C-terminal moieties of the SI region (Li, 2012; Peng et al., 2011) 

92 (Fig. 1). The S glycoprotein N-terminai domain (NTD) can function as a RBD (N-RBD); it 

93 can be the only SI domain engaged in receptor recognition or, in conjunction with C-terminal 

94 RBD, can broaden tissue tropism of certain CoV. As entry receptors, the N-RBD can 

95 recognize sialic acids in some cases (Fig. 1) (Peng et al., 2011), whereas it binds to 

96 carcinoembryonic antigen cell adhesion molecules (CEACAM) in MHV (Williams et al., 

97 1991). The NTD in TGEV is responsible for its enteric tropism, absent in the related porcine 

98 respiratory CoV (PRCV) that lacks this domain (Sanchez et al., 1992). The NTD region 

99 adopts a galectin-like structure in two beta-CoV, and its fold might be conserved in alpha- 

100 and gamma-CoV, since glycan- binding activity has been reported for the three genera (Li, 

101 2012; Schultze et al., 1996). 

102 In most CoV, the major determinants of cell tropism are found in the C-terminal portion of 

103 the SI region (Masters, 2006). These RBD can usually fold independently of the rest of the 

104 S, and can be expressed as a single domain with all receptor-binding determinants (Du et al., 

105 2013; Reguera et al., 2011; Wong et al., 2004; Wu et al., 2009). Sequence and structure of 

106 the RBD vary considerably among CoV, and they recognize distinct receptors (Fig. 1). 

107 Several CoV of the genus alpha, including TGEV and hCoV-229E, use aminopeptidase N 
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108 (APN) for cell entry (Delmas et al., 1992; Yeager et al., 1992), whereas hCoV-NL63 binds to 

109 the human angiotensin-converting enzyme 2 (ACE2) (Wu et al., 2009). In the beta-CoV, the 

110 SARS- and the MERS-CoV use ACE2 and dipeptidyl peptidase 4 (DPP4, CD26) receptors, 

111 respectively (Li et al., 2003; Raj et al., 2013). APN, ACE2 and DPP4 are membrane-bound 

112 ectoenzymes with multiple functions such as angiogenesis, cell adhesion and blood pressure 

113 regulation (Boonacker and Van Noorden, 2003; Crackower et al., 2002; Mina-Osorio, 2008). 

114 The three proteins catalyze peptide-bond hydrolysis of short peptides. The reason for CoV 

115 use of ectoenzymes as entry receptors is unclear; it might be linked to their abundance on 

116 epithelial cells rather than on their peptidase function, which does not appear to be essential 

117 for CoV cell entry (Li et al., 2005c). Virus-binding regions in these ectoenzymes are distant 

118 from the catalytic site (Li et al., 2005a; Lu et al., 2013; Peng et al., 2011; Reguera et al., 

119 2012; Wang et al., 2013; Wu et al., 2009). 

120 The identification of the CoV entry receptors and the RBD in the S glycoprotein led to 

121 structural characterization of the CoV-receptor interaction. RBD-receptor complexes have 

122 been determined for prototype alpha- (TGEV and hCoV-NL63) and beta-CoV (MHV, SARS- 

123 and MERS-CoV). RBD regions are targets of antibodies (Ab) that neutralize CoV infection, 

124 and their epitopes overlap receptor-binding motifs (Godet et al., 1994; He et al., 2005; 

125 Hwang et al., 2006; Pak et al., 2009; Prabakaran et al., 2006; Reguera et al., 2012). Some 

126 structural studies have determined how neutralizing Ab prevent CoV cell entry and infection. 

127 In this review, we will summarize the currently determined CoV-receptor complex structures, 

128 highlighting the distinct receptor recognition modes in this virus family. 

129 Alphacoronavirus recognition of cell entry receptors 

130 The alphacoronavirus (alpha-CoV) genus is a group of important animal and human 

131 viruses subdivided into several lineages (de Groot et al., 2011). The alpha 1 lineage 

132 comprises two types of canine (cCoV and cCoV-NTU336) and feline (fCoV and FIPV) CoV, 
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133 PRCV and TGEV; another lineage includes human CoV hCoV-229E and hCoV-NL63, and 

134 other members of the genus alpha are porcine epidemic diarrhea virus (PEDV) and some bat 

135 CoV. 

136 TGEV, one of the most studied alpha-CoV, has enteric and respiratory tropism. The 

137 enteric tropism is linked to its NTD, since a deletion mutant of TGEV (the homologous 

138 PRCV) shows only respiratory tropism (Sanchez et al., 1992). NTD binding to an attachment 

139 factor (sialic acid) is thought to be responsible for its enteric tropism (Schultze et al., 1996). 

140 TGEV, PRCV and the related animal alphal-CoV use APN for host cell entry (Fig. 1). APN 

141 is also the receptor for hCoV-229E (Delmas et al., 1992; Yeager et al., 1992), one of the first 

142 human CoV discovered, which is responsible for common colds (Kahn and McIntosh, 2005). 

143 The related hCoV-NL63 does not bind to APN and recognizes the cell surface ACE2 

144 ectoenzyme (Fig. 1) (Smith et al., 2006), like the SARS-CoV (Li et al., 2003). The cell 

145 surface receptor of PEDV and other alpha-CoV are currently unknown. 

146 The RBD in alpha-CoV 

147 The alpha-CoV RBD are modules of -150 residues that locate near the C-terminal portion 

148 of the SI region (Fig. 1) (Breslin et al., 2003; Godet et al., 1994; Wu et al., 2009). The RBD 

149 can be expressed independently of the S; binding studies with receptors and Ab show that the 

150 RBD preserves its native conformation and binding specificity (Reguera et al., 2011; Wu et 

151 al., 2009). Preparation of single RBD proteins facilitates their crystallization in complex with 

152 receptors and Ab. 

153 The crystal structures of hCoV-NL63, PRCV and TGEV RBD have been determined 

154 (Reguera et al., 2012; Wu et al., 2009). They show a single domain unit that has a p-barrel 

155 fold with two highly twisted p-sheets (Fig. 2). In one p-sheet, three p-strands (pi, p3 and P7) 

156 ran parallel (Fig. 2A). The three RBD have three disulphide bonds. In the crystal structure 

157 of the TGEV RBD, solved at high resolution, the bent p-strand 5 (p5) crosses both p-sheets 
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158 (Fig. 2A). N-linked glycans cluster at one side of the p-barrel; the opposite side is not 

159 glycosylated and might be closer to other S protein domains. N- and C-terminal ends of the 

160 RBD are located on the same side of the domain (terminal side); at the opposite side, two p- 

161 turns form the tip of the barrel in the TGEV RBD (Fig. 2A). This region of the p-barrel 

162 domain contacts the receptor (see below) and its conformation in the APN-binding RBD of 

163 TGEV and PRCV differs from the ACE2-binding region in the hCoV-NL63 domain (Fig. 2B, 

164 2C). These differences probably determine the distinct receptor-binding specificities of 

165 alpha-CoV. The TGEV or PRCV RBD tips are formed by two protruding p-turns (pl-p2 and 

166 p3-p4), each bearing a solvent-exposed aromatic residue (tyrosine or tryptophan) (Fig. 2A, 

167 2B). In contrast, the hCoV-NL63 RBD tip has a slightly recessed conformation, with the 

168 aromatic residues at the center of the receptor-binding surface (Fig. 2C). 

169 Alpha-CoV recognition ofAPN and ACE2 receptors 

170 Crystal structures have been reported for complexes of alpha-CoV RBD with the APN 

171 and ACE2 ectodomains (Reguera et al., 2012; Wu et al., 2009). The RBD of these viruses 

172 contact receptor regions distal to the cell membrane (Fig. 3). 

173 The APN ectodomain is composed of four domains (DI-DIV), is heavily glycosylated and 

174 forms dimers through extensive DIV-DIV interactions (Fig. 3A). Each APN monomer has an 

175 RBD bound in the crystal structure of the PRCV RBD-APN complex (Fig. 3A). The 

176 bidentate, protruding tip contacts the APN, and the exposed side chains of the tyrosine and 

177 tryptophan residues penetrate small cavities of the APN ectodomain. The tyrosine side chain 

178 fits between an a-helix and a carbohydrate N-linked to the APN, whereas the bulky 

179 tryptophan is in a narrow cavity formed at the DII-DIV junction (Fig. 3A). In addition to the 

180 tyrosine, other RBD residues contact the first N-acetyl glucosamine (NAG) linked to the 

181 porcine APN Asn736, and fix the glycan conformation. The CoV tyrosine and tryptophan 

182 residues are critical for TGEV RBD binding to the APN (Reguera et al., 2012), and 
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183 preliminary results indicated that they are essential for virus entry and infection (unpublished 

184 data). CoV recognition of APN is species-specific, and specificity is linked to the APN N- 

185 linked glycan that interact with the RBD pi-(32 turn in the structure (Reguera et al., 2012; 

186 Tusell et al., 2007). Porcine, feline and canine alpha-CoV with a tyrosine at the pi- p2 turn 

187 recognize APN proteins bearing the glycan. The large degree of sequence conservation in the 

188 RBD tip of alpha 1-CoV also suggests a highly conserved APN recognition mode (Reguera et 

189 al., 2012). hCoV-229E does not have a tyrosine in its RBD pi-p2 turn, however, and it 

190 recognizes the human APN that lacks this glycosylation (Reguera et al., 2012; Tusell et al., 

191 2007). The conformation of this alpha-CoV RBD tip differs from that of alphal-CoV, 

192 suggesting that hCoV-229E recognition of APN must be unique. It is nonetheless likely that 

193 this human alpha-CoV preserves a protruding tip for binding to small APN cavities. 

194 hCoV-NL63 RBD interacts with the ACE2 ectodomain opposite to the way that the alpha- 

195 CoV bind to APN. The hCoV-NL63 RBD has a blunt tip that contacts protruding regions of 

196 the receptor (Fig. 3B). In the middle of the interacting surface, the depressed center of the 

197 RBD tip contacts a unique receptor p-tum (p4—p5), which interacts with a tyrosine and a 

198 tryptophan in the virus protein (Fig. 3B). The rims of the RBD tip bind to two a-helices of 

199 the ACE2 receptor. Specificity is determined by several hydrogen bonds that engage amino 

200 and carbonyl groups in the main chains of the interacting molecules (Fig. 3B). 

201 Alpha-CoV use protruding RBD regions to bind APN or recessed surfaces to recognize 

202 exposed ACE2 motifs (Fig. 2, 3). Crystal structures demonstrate that the conformation of the 

203 receptor-binding region in the alpha-CoV S must be the principal determinant of its receptor 

204 recognition specificity. We recently demonstrated that the RBD tip is a principal antigenic 

205 determinant (site A) in the S of TGEV and related alpha-CoV (Reguera et al., 2012). Potent 

206 neutralizing Ab of porcine CoV cluster at site A (Delmas et al., 1990; Sune et al., 1990). 

207 These Ab recognize the RBD tip and bind to the tyrosine or the tryptophan essential for APN 
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208 binding (Reguera et al., 2012). These data suggest that the conformation of the alpha-CoV 

209 receptor-binding region evolved under pressure from the immune system, particularly in 

210 humans, leading to small variations in the way hCoV-229E recognizes the APN protein 

211 (Tusell et al., 2007) or to radical changes that modified receptor specificity in hCoV-NL63. 

212 Betacoronavirus recognition of cell entry receptors 

213 The betacoronavirus (beta-CoV) genus comprises four lineages, A (MHV, hCoV-HKUl 

214 and the betal-CoV), B (SARS-CoV), C (batCoV and MERS-CoV), and D ( batCoV HKU9) 

215 (de Groot et al., 2011). The most representative CoV prototypes of this genus are hCoV- 

216 OC43 (betal-CoV), MHV, SARS-CoV and the recently identified MERS-CoV. Members of 

217 lineage A CoV incorporate an extra, short spike-like glycoprotein in their envelope, the 

218 hemagglutinin esterase (HE) (Masters, 2006; Qinghong et al., 2008). 

219 hCoV-OC43 causes common cold and pneumonia in elderly populations, as well as severe 

220 lower respiratory tract infection in immunocompromised patients (Kahn and McIntosh, 

221 2005). Like bovine CoV (bCoV), another betal-CoV, it uses sialic acids (N-acetyl-9-O- 

222 acetylneuraminic acid, Neu5,9Ac2) as entry receptors (Fig. 1) (Krempl et al., 1995). Before 

223 SARS, MHV was the most studied beta-CoV in vitro and in vivo, especially in laboratory 

224 mouse. MHV strains cause specific inflammations in several mouse organs, such as the 

225 neurotropic strains JHM and A59 responsible for acute encephalitis and chronic 

226 demyelination in survivors, which serve as a model for the study of multiple sclerosis (Weiss 

227 and Leibowitz, 2011). The MHV cell entry receptor is a member of the CEACAM family 

228 (Williams et al., 1991). 

229 The SARS-CoV brought coronavirology to the center of the research community’s 

230 attention due to a worldwide epidemic with very high mortality rates (Gallagher and Perlman, 

231 2013). It uses ACE2 as the entry receptor (Li et al., 2003). Epidemiologists believe that 

232 SARS virus originated in bats (natural reservoir), was then transmitted to palm civets, ferret 
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233 badgers, and raccoon dogs (amplification and transmission hosts) and then introduced into 

234 man (Li et al., 2005b). SARS-CoV adaptation to different species and its transmission to 

235 humans is linked to subtle changes in the S glycoprotein, which increased its binding affinity 

236 for human ACE2 (Li et al., 2005c). 

237 MERS-CoV emerged in Saudi Arabia a decade after the SARS epidemic. It shares 90% 

238 sequence identity with batCoV-HKU4 and -HKU5, and it docks in beta-CoV lineage C (de 

239 Groot et al., 2013). Given this relationship, it is likely that MERS-CoV originated from bats 

240 (Raj et al., 2014). This virus uses DPP4 as a cell entry receptor (Raj et al., 2013). BatCoV- 

241 HKU4 recognizes the human DPP4 protein, indicating possible direct transmission from bats 

242 to humans (Wang et al., 2014; Yang et al., 2014). Recent evidence nonetheless shows 

243 involvement of dromedary camels as intermediates in virus transmission from bats to man 

244 (Doremalen et al., 2014; Haagmans et al., 2014). Human-to-human transmission is not 

245 frequent, probably because of low DPP4 expression in the human lower respiratory tract (Raj 

246 et al., 2014). 

247 Receptor recognition by the SARS-CoV 

248 Several crystal structures show the folding of the SARS-CoV RBD, the mode by which 

249 this virus recognizes its ACE2 entry receptor, and how Ab prevent virus binding to the 

250 receptor. These studies led to improved understanding of host-host transmission and 

251 adaptation of this CoV to humans, and also indicated strategies used by the SARS-CoV to 

252 evade neutralization by the immune system. 

253 The SARS-CoV RBD 

254 The SARS-CoV RBD is defined as a ~200-residue fragment in the C-terminal portion of 

255 the SI region (Fig. l)(Wong et al., 2004). It is composed of two subdomains; the core has a 

256 central five-stranded p-sheet surrounded by polypeptides that connect the p-strands (Fig. 4A, 

257 yellow). It has three small a-helices (A to C) and three disulphide bridges. A second 


11 


Page 11 of 49 



ACCEPTED MANUSCRIPT 


258 subdomain of ~65 residues inserts between two central p-strands of the core (p4and P7), and 

259 is distal to the terminal side of the domain (Fig. 4A, dark-red). This inserted subdomain lies 

260 on one side of the core and comprises a central two-stranded p-sheet connected by a long 

261 loop region; one side of this loop and the p-sheet clamp the core. The p-sheet, the extensive 

262 interactions with the core, and a disulphide bond in the most solvent-exposed region of the 

263 subdomain stabilize its structure (Fig. 4A). One crystal structure of the isolated SARS-CoV 

264 RBD shows that it can form dimers through the terminal side (Hwang et al., 2006). The 

265 dimerization surface in these crystals is relatively large (-1000 A 2 buried surface area, 

266 BSA/monomer) and the authors proposed that RBD dimers could crosslink S glycoprotein 

267 trimers. It is nonetheless unclear whether such oligomers are found on the virus envelope 

268 and could recognize ACE2. 

269 SARS-CoV binding to ACE2 

270 The ACE2 ectoenzyme is the cell entry receptor of SARS-CoV (Li et al., 2003). It is a 

271 type I membrane glycoprotein with an N-terminal extracellular domain built of two a-helical 

272 lobes; the catalytic site with a coordinated zinc ion is located between the two lobes (Fig. 3B, 

273 4B). The ACE2 ectodomain shows some conformational movement, and substrate binding to 

274 the active site leads to a closed conformation (Towler et al., 2004). Drug binding to this 

275 active site does not affect SARS-CoV binding, in accordance with virus recognition of a 

276 single lobe (Li et al., 2005c) (Fig. 4B). 

277 The SARS-CoV RBD inserted subdomain is the main S glycoprotein receptor-binding 

278 motif (Li et al., 2005a) (Fig. 4); the ACE2-binding subdomain region forms a curved, 

279 elongated surface with the two-stranded p-sheet at the bottom (Fig. 4A). The interaction 

280 buries 25 residues and about 860 A 2 of the virus protein, and a similar surface (820 A 2 ) of the 

281 ACE2 receptor. The ACE2-interactive surface of the SARS-CoV RBD is -100 A 2 larger that 

282 of hCoV-NL63, consistent with marked differences in kinetic dissociation rate constants, 
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283 which is an order of magnitude lower in SARS than in hCoV-NL63 (Li et al., 2005c; Wu et 

284 al., 2009). Both viruses recognize overlapping ACE2 regions, including the N-terminal a- 

285 helix (al) and the |3-turn formed by p4 and p5 strands (Fig. 3B, 4B). The central concave 

286 SARS-CoV RBD surface cradles the ACE2 N-terminal a-helix, whereas the terminal side of 

287 the subdomain interacts with the ACE2 p4-p5 turn and alO (Fig. 4B, 4C). The interaction 

288 includes at least 10 virus-receptor hydrophilic bonds, some of which engage the hydroxyl 

289 groups of RBD tyrosines that also mediate non-polar interactions with the receptor (Fig. 4C). 

290 There is an important virus-receptor hydrogen bond interaction between the ACE2 Lys353 

291 carbonyl and the main chain amino group of RBD Gly488 (Fig. 4C) (Li et al., 2005a). The 

292 lysine side chain amino interacts with RBD main chain carbonyl. This ACE2 lysine is absent 

293 in mouse and rat ACE2 proteins, which are not recognized by the SARS-CoV. ACE2 

294 glycosylation is also a determinant of SARS-CoV species specificity (Li et al., 2005c). A 

295 glycan linked to rat ACE2 Asn82 prevents its use as an efficient virus receptor. Deletion of 

296 the glycan and the His353/Lys substitution convert rat ACE2 into a SARS-CoV receptor, 

297 showing that efficient ACE2 recognition is central to virus infection and host-to-host 

298 transmission (Holmes, 2005; Li et al., 2005a; Li et al., 2005c). 

299 SARS-CoV emerged from bat CoV and was transmitted through palm civet CoV; cross- 

300 species transmission is linked to RBD changes that increased its affinity for human ACE2 

301 (Holmes, 2005; Li, 2013; Li et al., 2005a; Li et al., 2005c). Of the residues involved in 

302 SARS-CoV RBD binding to ACE2, only a few have a key role in SARS-CoV adaptation to 

303 man (Fig. 4C). Lys479/Asn and Ser487/Thr mutations are two key changes in the SARS- 

304 CoV S glycoprotein for infection of human cells. Substitutions in one of these residues 

305 increases SARS-CoV RBD binding affinity to human ACE2 by 20- to 30-fold, whereas the 

306 double mutation has a synergistic effect, with a 1000-fold increase in interaction affinity (Li 

307 et al., 2005c). The Asn at position 479 is found in some civet CoV; it does not affect binding 
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308 to civet ACE2, but increases SARS-CoV RBD affinity for the human protein (Li et al., 

309 2005c). Asn479 contacts the human ACE2 His34 and is relatively close to Lys31 in the N- 

310 terminal a-helix (Fig. 4C), which are Tyr34 and Thr31 in civet ACE2. The presence of a 

311 positively charged lysine rather than Asn in RBD position 479 does not complement the 

312 human ACE2 Lys31 and His34 residues. The crystal structure of SARS-CoV RBD in 

313 complex with human ACE2 demonstrates that the methyl group of the threonine at position 

314 487 establishes specific contacts with the ACE2 Tyr41 and Lys535 side chains, increasing 

315 affinity for the human receptor (Fig. 4C) (Li et al., 2005a). The SARS-CoV that caused 

316 sporadic outbreaks in 2003-2004 has serine at position 487 and shows very poor human-to- 

317 human transmission. This phenotype was also associated with the Leu472/Pro substitution in 

318 the ACE2 contact region of the SARS-CoV RBD (Li et al., 2005a). Other RBD residues 

319 have some influence on cross-species transmission of SARS-CoV (Li, 2013). 

320 Structural basis ofSARS CoVneutralization by antibodies 

321 The RBD is a major antigenic determinant in the S glycoprotein of the SARS-CoV (Du et 

322 al., 2009). Potent human and mouse SARS-CoV neutralizing Ab target the RBD and prevent 

323 virus infection by blocking its binding to the ACE2 receptor (He et al., 2005; Zhu et al., 

324 2007). The RBD can elicit broadly neutralizing Ab against diverse isolates, and human 

325 monoclonal Ab (mAb) can protect from infection by various zoonotic and human SARS-CoV 

326 (He et al., 2006; Zhu et al., 2007). Several conformational epitopes (I-VI) have been defined 

327 in the RBD, some of which are conserved in different species (He et al., 2006). Epitopes of 

328 several neutralizing Ab have been identified by crystal structures of RBD-Ab complexes 

329 (Hwang et al., 2006; Pak et al., 2009; Prabakaran et al., 2006), which show that they overlap 

330 with the receptor-binding region (Fig. 5). 

331 Neutralizing Ab bind to the RBD external subdomain that contacts ACE2 (Fig. 5). The 

332 human mAb m396 is a potent neutralizing Ab of several zoonotic and human SARS-CoV 
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333 (Zhu et al., 2007); it targets a region in the C-terminal side of the RBD inserted subdomain 

334 (residues 482-491) that is involved in ACE2 recognition, as well as residues in the RBD core 

335 (Fig. 5) (Prabakaran et al., 2006). The mAh epitope includes RBD residues Ile489 and 

336 Tyr491, which contact the receptor directly. A very similar epitope was described for the 

337 mouse mAh F26G19 (Fig. 5) (Pak et al., 2009), which contacts residues 486 to 492 of the 

338 RBD inserted subdomain and some regions of the core. Ile489 is a central residue in the 

339 F26G1 epitope (Fig. 5, black). Epitopes of mAh m396 and F26G19 are thus very similar, 

340 and include an exposed ridge in the RBD ACE2-binding region (Fig. 5); this S region must 

341 be a hot spot for SARS-CoV neutralization. 

342 The crystal structure of the human R80 mAh shows a distinct mode of SARS-CoV 

343 neutralization that also prevents vims binding to ACE2 (Fig. 5) (Hwang et al., 2006). The 

344 R80 variable domains make extensive contact with the concave region of the RBD-inserted 

345 subdomain (Fig, 5), mimicking the way that RBD and ACE2 interact. The R80 epitope in the 

346 RBD overlaps with the region buried by the N-terminai a-helix of the receptor. The total 

347 surface buried by the R80-RBD interaction is larger than the ACE2-RBD surface and is 

348 responsible for its high affinity (in the nanomolar range). This mAh makes contact with 29 

349 residues of the receptor-binding subdomain, 17 of which are involved in ACE2 recognition 

350 (Hwang et al., 2006). 

351 All three SARS-CoV-neutralizing mAh epitopes overlap with the receptor-binding region 

352 in the S protein (Fig. 5); efficient virus neutralization is thus achieved by targeting receptor- 

353 binding residues and blocking virus binding to ACE2 and thus, cell entry. Virus mutants 

354 have been identified that escape mAh neutralization, although these mutants usually cause 

355 attenuated infection (Rockx et al., 2010); some of the escape mutations map to the RBD 

356 inserted subdomain (Fig. 5) and probably affect SARS-CoV binding to ACE2. 

357 Receptor recognition by the MERS-CoV 
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358 MERS-CoV arose recently as a highly pathogenic virus in humans (Coleman and Frieman, 

359 2014); it is thought to have emerged from bats and is transmitted to humans via dromedary 

360 camels. Cross-species transmission is determined mainly by the adaptability of this CoV for 

361 different hosts, mediated by subtle modifications in its envelope S protein. MERS and 

362 SARS-CoV RBD are structurally similar (Fig. 6), but use different cell entry receptors; 

363 MERS-CoV attach to a distinct ectoenzyme, DPP4 (Raj et al., 2013). Several crystal 

364 structures have defined MERS-CoV RBD and how it binds to its DPP4 receptor (Chen et al., 

365 2013; Lu et al., 2013; Wang et al., 2013). 

366 The MERS-CoV RBD 

367 The MERS-CoV RBD is a fragment in the SI region C-terminal portion (Fig. 1); its 

368 structure is remarkably similar to the SARS-CoV RBD (Fig. 6) (rmsd of 2.4 A for 132 

369 residues), although they show little sequence identity. The MERS-CoV RBD also has two 

370 subdomains (Fig. 6A), the core with a central five-stranded |3-sheet and three disulphide 

371 bridges, as well as an inserted or external subdomain between two core (3-strands (Chen et al., 

372 2013; Lu et al., 2013; Wang et al., 2013). The central (3-sheet of the core is surrounded by 

373 polypeptides that connect the p-strands and contain helical structures (Fig. 6A). The core has 

374 an overall globular shape. The inserted subdomain is distal from the RBD terminal side and 

375 has a four-stranded p-sheet (Fig. 6A). The p-sheet and a long loop that connects the p-strands 

376 at one edges of the sheet clamp the core subdomain, as in the SARS-CoV RBD (Fig. 4A). 

377 The cores are more similar in MERS- and SARS-CoV than the external subdomain (Fig. 6B), 

378 which is longer in the MERS (80 residues) than the SARS RBD (65 residues). Because of 

379 the extended p-sheet, the solvent-exposed region of the inserted subdomain is broader than 

380 that of SARS-CoV. The first (p6) and last (p9) p-strands of the MERS-CoV inserted 

381 subdomain align with the two p-strands of the SARS-CoV inserted subdomain, but the other 

382 two p-strands (p7 and p8) are absent in the SARS RBD (Fig. 6B). The MERS-CoV inserted 
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383 subdomain contains a concave surface or small “canyon” formed by the (3-strands and the 

384 loop that connect (36 and (37 in the inserted RBD subdomain (Fig. 6A). This “canyon” is very 

385 distant from the terminal side and exposed for receptor recognition. It is absent in the SARS- 

386 CoV RBD, which contains a long loop in this location (Fig. 6B). Likely, these differences in 

387 the external subdomains are the major determinants of the distinct receptor-binding 

388 specificity between the MERS- and SARS-CoV. 

389 MERS-CoV binding to its DPP4 receptor 

390 DPP4 or CD26 is a multifunctional membrane-bound serine protease (Boonacker and Van 

391 Noorden, 2003). DPP4 is a type II membrane protein that forms homodimers on the surface 

392 of different cells (Fig. 7A). The DPP4 ectodomain has -730 amino acids and is composed of 

393 two domains, an a/(3-hydrolase domain and an eight-bladed p-propeller (Fig. 7A). The 

394 substrates bind to a pocket in a central cavity formed between the two domains (Boonacker 

395 and Van Noorden, 2003). The MERS-CoV contacts only the p-propeller domain (Fig. 7A, 

396 green). 

397 Crystal structures of the MERS-CoV RBD bound to DPP4 demonstrate that the virus 

398 attaches to the most membrane-distal region of the p-propeller (Lu et al., 2013; Wang et al., 

399 2013). One RBD binds to each of the DPP4 monomers in the dimer, away from the receptor 

400 dimerization interface (Fig. 7A). This dimeric virus-receptor complex is similar to the alpha- 

401 CoV RBD-APN structure described above (Fig. 3A). The bound RBD does not appear to 

402 interfere with DPP4 catalytic activity, binds only to the p-propeller subdomain and away 

403 from the regions at which the substrate accesses the active site. 

404 The MERS-CoV RBD engages the DPP4 molecule through the solvent-exposed side of its 

405 external subdomain (Fig. 6A, 7A). It contacts the edges of DPP4 p-propeller blades IV and 

406 V, including N-linked carbohydrates at blade IV and a helix at the linker between the two 

407 blades (Fig. 7B). It is the largest CoV-receptor interface, and buries 32 residues of the RBD 
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408 (-1110 A 2 surface) and of DPP4 (-1240 A 2 surface). In the two structures reported (PDB ID 

409 4KRO and 4L72) (Lu et al., 2013; Wang et al., 2013), the interaction includes between 9 to 

410 14 hydrogen bonds and 2 to 3 salt bridges. A key spot in this virus-receptor interaction 

411 includes the RBD contact with the helix that bulges out at the N-terminus of blade V (Fig. 

412 7B). The small “canyon” in the inserted subdomain cradles the DPP4 helix. This helix 

413 contains mostly hydrophobic residues (Ala291, Leu294, Ile295) that lie on a hydrophobic 

414 patch in the RBD “canyon”, composed of the side chains of Lys502, Leu506, Tyr540, 

415 Arg542, Trp553 and Val555, residues located in the three main p-strands of the subdomain 

416 (Fig. 7B). The side chain amino groups of Lys504 and Arg542 are hydrogen-bonded to the 

417 main chain of DPP4. The loop at one rim of the small “canyon” forms polar interactions with 

418 the DPP4 p-strands in blade V (Fig. 7B). 

419 An interesting feature of MERS-CoV binding to DPP4, also shown in the PRCV-APN 

420 complex (Fig. 3A, bottom), is the RBD interaction with N-linked receptor carbohydrates 

421 (Fig. 7B). The first three carbohydrates attached to DPP4 Asn229 are well defined in the 

422 crystal structures of the MERS-CoV RBD-DPP4 complex (Lu et al., 2013; Wang et al., 

423 2013). They interact with several solvent-exposed residues in the virus protein (Fig. 7B). 

424 The first NAG residue is hydrogen-bonded to RBD Glu536, whereas the second NAG of the 

425 glycan stacks onto the aromatic ring of viral Trp535, which strengthens the glycan-virus 

426 interaction and probably stabilizes motif conformation. The third mannose residue in the 

427 DPP4 N-linked glycan also interacts with the RBD tryptophan. Another glycan at DPP4 

428 Asn281 in blade IV is very close to the RBD (not shown), but does not interact with the virus 

429 protein. The conformation of this last glycan appears to be determined by its interaction with 

430 a tryptophan residue (Trpl87) in the DPP4 protein (Lu et al., 2013; Wang et al., 2013), and 

431 could be critical for MERS-CoV RBD binding to DPP4. A highly flexible glycan in this 
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432 position could prevent this virus-receptor interaction, such as shown for glycosylation in the 

433 APN or ACE2. 

434 The S glycoprotein N-terminal domain (NTD) in CoV receptor recognition 

435 Folding of the NTD 

436 The S glycoprotein NTD can mediate attachment of CoV particles to cell surface 

437 molecules (Peng et al., 2011; Peng et al., 2012; Schultze et al., 1996; Tsai et al., 2003). It 

438 thus function as an RBD (N-RBD) in certain CoV. The crystal structure of the MHV NTD 

439 shows a galectin-like fold (Fig. 8) (Peng et al., 2011); the homologous bCoV NTD also has a 

440 galectin fold (Peng et al., 2012). The galectins are a family of lectins with a common 

441 p-sandwich carbohydrate recognition domain (CRD) (Fig. 8A). They preferentially 

442 recognize N-acetyl lactosamine in cell surface proteins, which binds to conserved residues on 

443 one of the CRD p-sheets (Fig. 8A). The CoV NTD is also composed of a central p-sandwich 

444 formed by two long p-sheets with six and seven p-strands that is structurally similar to the 

445 galectin CRD (Fig. 8B). 

446 The CoV is thought to incorporate this N-terminal galectin-like domain from the host 

447 (Peng et al., 2012). In several CoV such as TGEV, betal-CoV and IBV, the NTD preserves 

448 glycan binding activity, whereas in MHV it binds to a protein receptor, CEACAM1 (Peng et 

449 al., 2012; Tsai et al., 2003). The CoV NTD has diverged from galectins and recognizes 

450 proteins or sialic acids rather than N-acetyl lactosamine; the mode of ligand recognition also 

451 differs (Peng et al., 2012). Although the side of the NTD that recognizes cell surface 

452 molecules is the same side as the galectin CRD that binds carbohydrates, the top of the 

453 carbohydrate-binding p-sheet is covered by polypeptides that shape the receptor-binding 

454 region in CoV (Fig. 8, 9A). In addition, a glycan N-linked to one edge of the p-sheet further 

455 prevents ligand binding to the carbohydrate-binding sheet in galectins. This region is similar 
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456 in MHV, which binds CEACAM1, and in bCoV, which binds sialic acid, showing that the 

457 NTD has evolved in CoV to specifically select cell entry receptors. 

458 MHV binding to the CEACAM1 receptor 

459 MHV is a prototype beta-CoV of the A lineage. It uses CEACAM receptors to enter host 

460 cells (Williams et al., 1991). CEACAM are type I membrane proteins of the immunoglobulin 

461 superfamily (IgSF), markers of colorectal tumors that contribute to tumorigenesis 

462 (Beauchemin et al., 1999); in contrast to other CoV receptor proteins, they are not peptidases. 

463 The CEACAM mediate homo- and heterophilic cell adhesion. There are two murine 

464 CEACAM genes, CEACAM1 and CEACAM2. CEACAM1 has four splice forms, which have 

465 two (Dl, D4) or four (D1-D4) Ig-like domains in the extracellular region, as well as a 

466 transmembrane region and two distinct cytoplasmic tails (Beauchemin et al., 1999). All four 

467 CEACAM1 variants can be used as receptors by MHV (Dveksler et al., 1993). CEACAM1 

468 is also a receptor for virulent Neisseria strains (Virji et al., 1999). 

469 CEACAM 1 is a member of the IgSF, and the MHV S protein recognizes the N-terminai 

470 Ig-like domain 1 (Dl), which adopts a variable (V) fold (Tan et al., 2002). The virus 

471 interacts with the CFG p-sheet of Dl (Fig. 9B), the surface commonly engaged in 

472 intermolecular interactions by cell surface molecules of the IgSF. The CFG p-sheet is formed 

473 by the p-strands C, C', C" on one side and the p-strands F and G on the other (Fig. 9B). 

474 About 25 receptor residues, 770 A 2 of its surface, are buried by the MHV protein. Most of 

475 the virus-binding residues locate at the Dl C" edge and around the FG loop. CEACAM1 has 

476 a unique CC' loop that protrudes from the CFG p-sheet of the Ig-like domain (Tan et al., 

477 2002). This is a key structural determinant for CEACAM1 recognition by the MHV S 

478 protein (Peng et al., 2011). 

479 The CE AC AMI-binding surface is on top of the galectin-like p-sandwich in the MHV 

480 N-RBD (Fig. 9A). The N-terminal portion of the MHV N-RBD structure occupies the top of 
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481 the receptor-binding surface and contributes 50% of the 24 MHV residues buried by 

482 interaction with the receptor. The N-terminal residues form a “socket” that contains a 

483 hydrophobic amino acid, Leul60, at the bottom (Fig. 9A). Ile41 of CEACAM1 is exposed in 

484 the D1 CC' loop and penetrates the socket (Fig. 9B). MHV Tyrl5, Leu89 and Leul60 

485 contact the Ile41 side chain (Fig. 9C), and comprise a critical virus-receptor motif (Peng et 

486 al., 2011; Tan et al., 2002). Surrounding residues in the CEACAM1 CC/ loop, Thr39 and 

487 Asp42, form hydrogen bonds with the MHV N-RBD (Fig. 9C), which confirms the 

488 importance of this receptor region in virus recognition. 

489 The N-terminal portion of the MHV N-RBD also contacts other motifs in the C" edge of 

490 Dl. In the C' p-strand, CEACAM1 Arg47 contributes to binding and establishes hydrogen 

491 bonds with the main chain carbonyl oxygens of MHV N-terminal residues. Up to 10 polar 

492 virus-receptor interactions contribute to virus-receptor specificity. MHV N-terminal residues 

493 interact extensively with the receptor C" p-strand, which runs parallel to the pl-strand of the 

494 virus domain. Phe56 in the C" p-strand appears to be an important residue for the interaction 

495 and establishes van der Waals contacts with the virus protein. Another important receptor- 

496 binding motif surrounds MHV Leu 174 and contacts the loops at the top of the CFG p-sheet 

497 (Fig. 9B). This N-RBD region protrudes slightly and is distant from the socket. 

498 The crystal structure of the MHV NTD in complex with CEACAM1 shows how the 

499 N-terminal module of a CoV S recognizes a protein receptor. This region has been 

500 implicated in the recognition of sialic acids in alpha- (TGEV), beta- (bCoV) and gamma- 

501 (IBV) CoV (Fig. 1). The NTD of these CoV were proposed have a similar fold, which was 

502 confirmed by the crystal structure of the bCoV NTD (Peng et al., 2012). As in the MHV 

503 structure (Fig. 8), the bCoV NTD has polypeptides on the top of the galectin-like p-sandwich. 

504 The bCoV NTD structure nonetheless lacks the MHV NTD socket, a critical motif for 

505 CEACAM1 binding. Differences in the conformation of exposed NTD regions could be 
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506 responsible for the distinct receptor-binding specificity observed among CoV that use the 

507 N-terminal module to bind to cell surface molecules. 

508 
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508 Discussion 

509 The structural studies reviewed here established the basis for understanding receptor 

510 recognition diversity in CoV, its evolution and its adaptation to different hosts. CoV RBD 

511 folding, conformation of receptor-binding motifs and subtle changes in those motifs 

512 determine receptor binding specificity and CoV host range. Two domains of the 

513 multifunctional CoV S glycoprotein anchor the virus particles to cell surface molecules for 

514 virus penetration of cells (Fig. 1). The two domains might be exposed in the SI region for 

515 CoV binding to host cell entry receptors (Fig. 10). 

516 The S glycoprotein NTD can function as an RBD in certain CoV (Fig. 1), and might have 

517 a conserved fold in alpha-, beta- and gamma-CoV (Peng et al., 2012). This domain has a 

518 galectin-like core, which indicates it was incorporated into the CoV S from a host (Li, 2012; 

519 Peng et al., 2011). It has evolved in some CoV to recognize cell surface molecules such as 

520 sialic acids, or in MHV to bind the CEACAM1 protein (Fig. 1). CoV NTD has integrated 

521 polypeptides and an N-linked glycan on the top of the flat galectin-like (3-sandwich, which 

522 covers the galactose-binding p-sheet in galectins (Fig. 8). The virus-specific conformation of 

523 the polypeptides at the top of the NTD probably determine its receptor-binding specificity 

524 (Peng et al., 2012). The MHV NTD contains a socket for specific recognition of a unique 

525 structural feature in the CEACAM1 D1 (Fig. 9). Acquisition of the galectin-like NTD from 

526 the host probably expanded CoV host cell tropism, as shown for the TGEV NTD that confers 

527 enteric tropism (Schultze et al., 1996), although MHV and related beta-CoV only use the 

528 NTD for recognition of cell surface proteins (Fig. 1). The receptor-binding function of the SI 

529 C-terminal portion appears to have been lost in these CoV. It would be interesting to explore 

530 the conformation of this region, which could provide clues to its presumed lack of function. 

531 The SI C-terminal RBD have unique structures unrelated to host proteins (Chen et al., 

532 2013; Li et al., 2005a; Peng et al., 2011; Reguera et al., 2012; Wu et al., 2009) and can thus 
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533 be considered genuine CoV RBD. Alpha- and beta-CoV RBD adopt two distinct folds, but 

534 bind only to ectoenzymes; the NL63- and SARS-CoV bind to the same protein, ACE2 

535 (Fig. 1). Some of these enzyme features must be essential for CoV entry into host cells. 

536 Perhaps they cluster with other proteases that facilitate fusion (Glowacka et al., 2011). 

537 ACE2, APN and DPP4 have distinct structures and functions, but their ectodomains share an 

538 inherent conformational flexibility (Boonacker and Van Noorden, 2003; Towler et al., 2004; 

539 Xu et al., 1997) that could assist in dissociation of the S1-S2 heterotrimer. Trimeric spikes 

540 that bind simultaneously to several receptor molecules could disassemble by pulling forces 

541 generated during ectodomain movement. The conformation and dynamics of the APN 

542 ectodomain vary with the pH (unpublished data), so that endosomal acidification can alter 

543 APN conformation during receptor-mediated endocytosis. 

544 Alpha-CoV RBD adopt a conserved p-barrel fold (Fig. 2) (Reguera et al., 2012; Wu et al., 

545 2009). SI C-terminal fragments of the IBV gamma-CoV and the bulbul delta-CoV share 

546 certain sequence similarity with the alpha-CoV RBD, and could have a similar fold (Reguera 

547 et al., 2012). Crystal structures of alpha-CoV in complex with receptors identified the 

548 receptor-binding region in the RBD (Reguera et al., 2012; Wu et al., 2009), which has 

549 remarkable structural variability (Fig. 2). The conformation of the RBD tip dictates the 

550 receptor molecule used by alpha-CoV for host cell entry. RBD with protruding tips 

551 determine alpha-CoV attachment to APN, whereas those with blunt RBD tips recognize 

552 ACE2 and perhaps other yet uncharacterized receptor molecules. Structures of alpha-CoV 

553 RBD in complex with APN or ACE2 show two opposite modes of CoV-receptor recognition 

554 (Reguera et al., 2012; Wu et al., 2009) (Fig. 3). In viruses, recessed surfaces hide conserved 

555 receptor-binding residues from antibodies (Casasnovas, 2013; Rossmann, 1989); hCoV- 

556 NF63 uses a recessed surface to recognize exposed ACE2 motifs, following a receptor- 

557 binding strategy similar to the other beta-CoV reviewed here. CoV binding to APN is unique 
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558 among CoV, and contrasts with the mode of ACE2 and DPP4 recognition. The bidentate, 

559 protruding RBD tip of alphal-CoV, which has two exposed aromatic residues, penetrates 

560 small cavities of the APN ectodomain (Fig. 3A). Similarly to MERS-CoV, the alphal-CoV 

561 also recognize a dimeric cell surface protein. 

562 The folding of the MERS- and SARS-CoV RBD are similar (Fig. 6) (Chen et al., 2013; Li 

563 et al., 2005a; Lu et al., 2013; Wang et al., 2013). Both have a core with a single p-sheet and 

564 an additional subdomain that recognizes cell entry molecules. MERS- and SARS-CoV show 

565 extraordinary potential for cross-species transmission, related to S binding to distinct 

566 orthologous receptor molecules. This is probably linked to the specific structure of their 

567 RBD, especially to the extended receptor-binding surfaces of the inserted subdomains 

568 (Fig. 4A, 6A). A few changes in those large surfaces increase affinity for receptor molecules 

569 in new hosts, while preserving vims growth (Holmes, 2005; Li et al., 2005c). Measles virus 

570 (MV) follows a similar strategy for recognition of several receptors that facilitate virus 

571 growth and transmission (Casasnovas, 2013). The MV hemagglutinin uses a broad concave 

572 surface to bind to three distinct receptor molecules, a unique feature of MV among the 

573 paramyxoviruses. The use of a large receptor recognition surface probably enables virus 

574 dissemination in tissues and host-to-host virus transmission. 

575 The DPP4-binding surface in the MERS-CoV is larger (-300 A 2 ) than the ACE2-binding 

576 surface in SARS-CoV, which correlates with a larger RBD inserted subdomain. The two 

577 CoV use concave surfaces to bind different receptors. MERS-CoV uses a small “canyon” to 

578 bind to an a-helix in the linker between blades IV and V of the DPP4 p-propeller (Fig. 7), 

579 whereas the curved inserted subdomain in SARS-CoV RBD cradles the N-terminal a-helix of 

580 ACE2 (Fig. 4). The mode by which these CoV bind to receptors shows similarities to other 

581 CoV-receptor interactions, particularly to hCoV-NL63, which also binds to ACE2 (Fig. 3B). 

582 NL63- and SARS-CoV recognize overlapping ACE2 regions, including two helices and a 
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583 |3-turn in the virus-binding lobe of the receptor. The ACE2-binding surfaces in both CoV are 

584 concave and are distant from the terminal end of the RBD. The receptor-binding surface in 

585 SARS-CoV is more extended and curved than in hCoV-NL63, and interacts more extensively 

586 with the ACE2N-terminal a-helix; the two residues involved in SARS-CoV adaptation to 

587 humans (Asn479 and Thr487) interact directly with the a-helix. 

588 MERS- and alphal-CoV share recognition of carbohydrates N-linked to their receptors 

589 (Fig. 3A, 7B) (Lu et ah, 2013; Reguera et ah, 2012; Wang et ah, 2013). In APN, the N- 

590 linked glycan is essential for binding and infection of TGEV and related alpha-CoV (Reguera 

591 et al., 2012; Tusell et al., 2007). Receptor glycosylations are important determinants of CoV- 

592 receptor recognition, as they can promote or hinder CoV binding to cell entry receptors in 

593 certain species (Holmes, 2005; Tusell et al., 2007), which delimits CoV host range. 

594 The CoV RBD is a major target of neutralizing Ab that prevent virus infection by blocking 

595 virus binding to receptors (Hwang et al., 2006; Pak et al., 2009; Prabakaran et al., 2006; 

596 Reguera et al., 2012; Zhu et al., 2007). RBD protein can elicit potent neutralizing Ab and 

597 protective immune responses (Du et al., 2009). These neutralizing Ab recognize the exposed 

598 receptor-binding tyrosine or tryptophan in TGEV or PRCV (Reguera et al., 2012). In the 

599 SARS-CoV, structural studies showed that several neutralizing Ab bind to the receptor- 

600 binding subdomain (Fig. 5) (Hwang et al., 2006; Pak et al., 2009; Prabakaran et al., 2006). 

601 These results indicate that the receptor-binding regions are under selective pressure from the 

602 immune system. In alpha-CoV, this pressure could mediate the notable conformational 

603 changes in the RBD tip (Fig. 2), which alter receptor-binding specificity. The APN-binding 

604 tip in alpha-CoV RBD has exposed receptor-binding residues that are easily targeted by Ab, 

605 whereas the recessed ACE2-binding tip in hCoV-NL63 more efficiently hides conserved 

606 receptor-binding residues from immune surveillance. 
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607 Apha- and beta-CoV RBD folds are distinct but are both unique, with no known homology 

608 to host domains (Chen et at., 2013; Li et at., 2005a; Peng et at., 2011; Reguera et at., 2012; 

609 Wu et at., 2009). They are thought to have evolved from a common CoV RBD ancestor (Li, 

610 2012). They share some common features, such as recognition of glycans N-linked to 

611 receptors, and the presence of parallel p-strands (p2-pll in MERS and pi- p3- p7 in TGEV, 

612 Fig. 2A, 6A). It is tempting to speculate that this precursor RBD had a p-barrel fold similar 

613 to the alpha-CoV, with a variable tip that accommodated different receptor molecules. In 

614 SARS and MERS beta-CoV, the RBD lost the p-barrel fold, but maintained two p-sheets, one 

615 of which forms a large receptor-binding platform with recessed surfaces that bind to specific 

616 motifs in receptor molecules. The receptor-binding subdomains in SARS and MERS beta- 

617 CoV appear to specialize in recognition of orthologous receptor molecules. The beta-CoV 

618 RBD probably evolved to enhance host-to-host transmission, responsible for the recurrent 

619 CoV outbreaks in man. 

620 Structural studies reviewed here have established the basis for understanding receptor 

621 recognition diversity in CoV, its evolution and adaptation to different hosts. These studies 

622 have identified sites of vulnerability in the CoV S that should guide the development of anti- 

623 virals and vaccines to prevent CoV infections. 

624 Analysis and representation of crystal structures 

625 Buried surfaces and residues at the molecular complex interfaces were determined with the 

626 PISA server (http://www.ebi.ac.uk/msd-srv/prot_int/pistart.html). Figure 2A was prepared 

627 with Ribbons (Carson, 1987), Figure 10 with Chimera (Pettersen et al., 2004) and the other 

628 structure representations with PyMOL software (pymol.org). 
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900 Figure legends 

901 Figure 1. The CoV S glycoprotein and CoV cell surface receptors 

902 Scheme of a CoV S glycoprotein with the functional domains in the SI and S2 regions, which 

903 are exposed in the virus envelope. The N-terminal signal peptide and the transmembrane 

904 region are also shown. The N-terminal domain (NTD) that can act as a receptor-binding 

905 domain (N-RBD) and the canonical CoV RBD in the C-terminal portion of SI are indicated. 

906 The heptad repeat regions (HR1 and HR2) and the putative fusion peptide (FP) are marked in 

907 S2. The arrowhead indicates the putative protease cleavage site in some CoV. Cell entry 

908 receptor molecules identified for the indicated CoV (right) are shown beneath their respective 

909 RBD regions. Sialic acids recognized by TGEV and IBV should be considered attachment 

910 factors. 

911 Figure 2. Structures of alpha-CoV RBD and receptor-binding surfaces 

912 A. Ribbon diagram of the TGEV RBD structure (PDB ID 4F2M) (Reguera et al., 2012). 

913 |3-strands (numbered) are shown in light or dark blue, coils in orange, and the helix in red; a 
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914 |3-bulge at p-strand 5 is in magenta. N- and C-terminal ends on the terminal side of the 

915 structure are indicated in lowercase letters. The Asn residues at glycosylation sites and the 

916 attached glycans defined in the structure are shown as a ball-and-stick model, with carbons in 

917 yellow. Cysteine residues and disulphide bonds are shown as green cylinders. The two 

918 p-turns at the p-barrel domain tip are labeled. Ribbon diagrams of the PRCV and hCoV- 

919 NL63 RBD structures are shown in B and C, respectively. The structures of these domains 

920 were determined in complex with the APN (PRCV, PDB ID 4F5C) and ACE2 receptors 

921 (NL63, PDB ID 3KBH) (Reguera et al., 2012; Wu et al., 2009). Receptor-binding surfaces in 

922 the RBD are shown in pink or red (tyrosine or tryptophan residues) and were generated by 

923 the RBD residues that contact the respective receptor molecules in the structures. 

924 Figure 3. Alpha-CoV recognition of cell surface receptors 

925 Crystal structures of alpha-CoV RBD in complex with the ectodomains of APN (A) and 

926 ACE2 (B). 

927 A. Ribbon drawing of the dimeric structure of the PRCV RBD-APN complex (PDB ID 

928 4F5C) (Reguera et al., 2012). Pig APN molecules are shown with domains in orange 

929 (N-terminal DI), yellow (DII), red (Dill) and green (C-terminal DIV), as well as the 

930 N-terminal ends near the putative location of the cell membrane. The RBD is shown as 

931 ribbon and surface drawings in blue and cyan, with the APN-binding tyrosine and tryptophan 

932 residues at the RBD tip in red. 

933 B. Ribbon drawing of the hCoV-NL63 RBD-ACE2 complex (PDB ID 3KBH) (Wu et al., 

934 2009). The ACE2 molecule is shown with the two lobes in green (N-terminal) and orange 

935 (C-terminal). The RBD is shown as ribbon and surface drawings in blue, with the ACE2- 

936 binding residue in pink and the aromatic residues that contact the receptor in red. The N- and 

937 C-terminal ends of the receptor molecules are marked in lowercase letters, N-linked glycans 
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938 are shown as sticks with carbons in yellow, and the zinc ion at the catalytic sites of APN and 

939 ACE2 as cyan spheres. 

940 For A and B, details of key virus-receptor binding motifs are shown beneath the complex 

941 structures. Interaction of the PRCV RBD pl-p2 and p3-p4 turns (shown as sticks) at the 

942 domain tip with cavities in the APN (ribbon and surface drawings). The tyrosine at the pl-p2 

943 turn contacts APN residues and the NAG carbohydrate (yellow surface), which is N-linked to 

944 pig APN Asn736. The tryptophan side chain at the p3-p4 turn penetrates between DII and 

945 DIV. Interaction of the concave center of the hCoV-NL63 RBD tip with the ACE2 p4-p5 

946 turn. Lys535 at the tip of the ACE2 turn is labeled. The ACE2 a-helices al and alO contact 

947 the most exposed regions of the RBD loops. Sides chains of buried residues in the virus- 

948 receptor interfaces are shown with oxygens in red and nitrogens in blue in this and the 

949 following figures; hydrogen bonds are dark dashed lines. 

950 Figure 4. SARS-CoV RBD and binding to ACE2 

951 A. Ribbon drawing of the SARS-CoV RBD (PDB ID 2AJF) (Li et al., 2005a), with the core 

952 subdomain in yellow and the inserted subdomain in dark red. The p-strands and a-helices are 

953 labeled with numbers and uppercase letters, respectively. Terminal ends are labeled in 

954 yellow and disulphide bonds in green; Asn residues at glycosylation sites and the attached 

955 glycans are shown as sticks, with carbons in yellow. SARS-CoV residues that bind to the 

956 ACE2 receptor and define the receptor-binding surface are pink. 

957 B. Ribbon drawing of the SARS-CoV RBD-ACE2 complex (PDB ID 2AJF) (Li et al., 

958 2005a). ACE2 is shown as in Fig. 3B and the RBD as in panel A. The three main ACE2 

959 regions recognized by SARS-CoV are labeled in green. 

960 C. Key virus-receptor binding motifs. ACE2 residues are shown, with carbons in green. In 

961 the RBD, receptor-binding tyrosines and an arginine are shown, with carbons in pink, 
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962 whereas the two critical residues for SARS-CoV adaptation to human ACE2 (Asn479 and 

963 Thr487) are shown, with carbons in magenta. 

964 Figure 5. SARS-CoV neutralizing Ab bind to the RBD. 

965 Ribbon drawing of the SARS-CoV RBD in complex with three neutralizing Ab (Hwang et 

966 al., 2006; Pak et al., 2009; Prabakaran et al., 2006). The three RBD-Ab crystal structures 

967 were superimposed based on the RBD. The RBD is shown as in Figure 4A and the variable 

968 domains of the Ab in green (R80, PDB ID 2GHW), blue (F26G19, PDB ID 3BGF) and cyan 

969 (m396, PDB ID 2DD8). RBD Ile489, which is recognized by the m396 and F26G19 Ab (Pak 

970 et al., 2009; Prabakaran et al., 2006), is black. Side chains of residues that change in scape 

971 mutants to the neutralization are shown in red (Rockx et al., 2010). 

972 Figure 6. The MERS-CoV RBD and comparison with the SARS RBD. 

973 A. Ribbon drawing of the MERS-CoV RBD (PDB ID 4KRO) (Eu et al., 2013), shown as for 

974 SARS-CoV RBD in Fig. 4A, but with the core subdomain in dark yellow. MERS-CoV 

975 residues that bind to its DPP4 receptor define the receptor-binding surface (pink). The 

976 arrowhead indicates the small “canyon” on one side of the DPP4-binding surface. 

977 B. Stereo view of superimposed MERS- (yellow) and SARS-CoV (red) RBD, core 

978 subdomain-based. The p-strands of the MERS-CoV inserted subdomain are labeled and the 

979 two conserved in the SARS-CoV are red. 

980 Figure 7. MERS-CoV RBD binding to DPP4 

981 A. Ribbon drawing of the dimeric MERS-CoV RBD-DPP4 complex structure (PDB ID 

982 4KRO) (Lu et al., 2013). The DPP4 monomers are shown with the N-terminal p-propeller 

983 domain in green and the C-terminal a/p-hydrolase domain in orange. The RBD molecules are 

984 as in Figure 6A. Labels and glycosylation are as in previous figures. 

985 B. Key virus-receptor binding motifs. The virus-binding DPP4 p-propeller blades IV and V 

986 are shown in light and dark green, respectively. DPP4 residues are shown, with carbons in 
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987 green. In the RBD, residues in the small “canyon” that interact with the exposed a-helix in 

988 the blade linker are shown, with carbons in magenta, whereas those that bind to the DPP4 

989 N-linked glycan (Asn229) are shown, with carbons pink. Some residues in the two receptor- 

990 binding motifs and the external subdomain (3-strands (p6-p9) are labeled. 

991 Figure 8. Structure of the S glycoprotein NTD 

992 A. Ribbon drawing of the human galectin-3 carbohydrate recognition domain (CRD) bound 

993 to galactose (PDB ID 1A3K) (Seetharaman et al., 1998). The p-strands in the p-barrel are in 

994 light or dark blue, and a galactose ligand on the top of the p-sheet is shown as sticks, with 

995 carbons in yellow. N- and C-terminal ends are indicated in lowercase letters. 

996 B. Ribbon drawing of the MHV NTD structure (PDB ID 3R4D) (Peng et al., 2011). The 

997 p-strands in the central galectin-like p-barrel are in light or dark blue, and those on the top of 

998 the sheet are in pink. The Asn residues at glycosylation sites and the attached glycans 

999 defined in the structure are shown as sticks, with carbons in yellow. Cysteine residues and 

1000 disulphide bonds are shown as green sticks. 

1001 Figure 9. MHV recognition of its CEACAM1 receptor 

1002 A. The MHV NTD structure with the CEACAM1-binding surface. The NTD ribbon 

1003 diagram is shown as in Figure 8B. The surface of the N-terminal MHV residues that form a 

1004 socket is shown in violet and that of the other receptor-binding residues is pink. MHV 

1005 Leu 160 in the bottom of the socket is shown in red. 

1006 B. The MHV NTD in complex with the CEACAM1 receptor (PDB ID 3R4D) (Peng et al., 

1007 2011). The CEACAM1 N-terminal D1 is shown in green, with the p-strands in the receptor- 

1008 binding CFG p-sheet labeled. The side chain of CEACAM1 Ile41 that penetrates the NTD 

1009 socket is shown as spheres. The MHV Leul60 in the socket and Leul74 that contacts the top 

1010 of D1 are in red. 
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1011 C. Key virus-receptor binding motifs. Side chains of some receptor-binding MHV residues 

1012 are shown, with carbons in pink; the hydrophobic residues in the bottom of the socket and 

1013 Leul74 are in magenta; the CEACAM1 residues are in green. Ile41 in the CC' loop, the 

1014 most important virus-binding motif in CEACAM-1 (Peng et al., 2011), is shown as spheres. 

1015 Figure 10. Structural view of the multifunctional CoV S with the two domains that bind 

1016 to host cell surface receptors. The two domains, NTD and RBD, of the SI region that CoV 

1017 use for attachment to cell surface molecules (Fig. 1) docked into the cryo-electron 

1018 microscopy map (grey) of the trimeric SARS-CoV S (EMD-1423) (Beniac et al., 2006). 

1019 Ribbon representations of the SARS-CoV RBD (yellow) and the MHV NTD (blue) alone or 

1020 bound to ACE2 (Fig. 4B) and to CEACAM1 D1 (Fig. 9B), respectively. 

1021 
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• Structural basis of coronavirus attachment to host cell entry receptors. 

• Coronavirus-receptor complex structures. 

• Evolution of receptor-recognition in coronavirus. 

• Coronavirus host-to-host transmission and adaptation to man. 

• Sites of vulnerability in the coronavirus spike glycoprotein. 

• Antibody neutralization of coronavirus. 
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